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ABSTRACT 

There  are  three  types  of  data  association  problems.  The  first  is  the  observation-to-track  association  (OTTA)  problem, 
where  given  an  observation  with  some  known  measurement  statistics  and  a  set  of  existing  candidate  (uncertain)  res¬ 
ident  space  object  (RSO)  tracks  the  analyst  seeks  to  associate  each  observation  with  a  unique  track  (or  none).  The 
second  association  problem  is  where  we  have  multiple  tracks  at  different  time  instances  and  wish  to  determine  whether 
any  of  the  tracks  belong  to  the  same  RSO.  This  is  the  track-to-track  association  (TTTA)  problem.  The  final  association 
problem  is  where  we  are  given  a  set  of  observations  at  different  time  instances  and  wish  to  determine  which  of  these 
observations  were  generated  by  the  same  RSO.  This  is  the  observation-to-observation  association  (OTOA)  problem. 
The  focus  of  our  paper  is  the  OTOA  problem.  In  this  paper,  we  tackle  the  OTOA  problem  by  using  an  appropriate 
initial  orbit  determination  (IOD)  method  as  well  as  criteria  from  information  theory.  The  two  main  criteria  we  use  in 
this  paper  are  mutual  information  and  information  divergence.  We  demonstrate  how  these  two  criteria  can  be  used 
within  an  unscented  transform  framework  as  well  as  with  a  particle-based  approach.  The  information  theoretic  solu¬ 
tions  described  in  this  paper  can  be  adjusted  to  address  the  other  (OTTA  and  TTTA)  association  problems,  which  will 
be  the  focus  of  future  research.  We  will  demonstrate  the  main  result  in  simulation. 


1.  INTRODUCTION 

Consider  a  set  of  indistinguishable  objects  moving  continuously  under  the  influence  of  a  common  set  of  deterministic 
dynamics  and  stochastic  environmental  influences.  One  or  more  of  these  objects  appear  randomly  in  the  field  of  view 
(FOV)  of  one  or  more  sensors  (i.e.  they  are  detectable  above  the  background  sensor  noise).  While  these  objects 
persist  in  the  sensor  FOV  and  remain  detectable,  the  sensor  provides  a  set  of  noisy  measurements  of  the  object  states 
and  their  time  stamp,  which  typically  includes  a  subset  of  false  detections  and  clutter.  The  essence  of  the  multi¬ 
object  tracking  problem  is  to  find  tracks  from  these  noisy  sensor  measurements  and  to  rule  out  clutter  from  resident 
space  objects  (RSOs).  The  literature  is  replete  with  techniques  on  state  estimation  if  the  sequence  of  measurements 
associated  with  each  object  is  known.  However,  the  association  between  measurement  observations  and  objects  is  not 
always  known,  leading  to  the  well  known  problem  of  un-correlated  tracks  (UCTs)  when  attempting  to  update  the  space 
catalog  of  observed  RSOs.  The  crux  of  modern  space  surveillance  from  an  algorithmic  point  of  view  is  to  solve  the 
data  association  problem  and  determine  which  measurements  were  generated  by  which  objects. 

In  general,  there  are  three  types  of  data  association  problems.  The  first  is  the  observation-to-track  association  (OTTA) 
problem  described  above,  where  the  analyst  seeks  to  associate  each  observation  with  a  unique  track  (or  none)  given 
an  observation  with  some  known  measurement  statistics  and  a  set  of  existing  candidate  (uncertain)  resident  space 
object  (RSO)  tracks.  The  second  association  problem  is  where  we  have  multiple  tracks  at  different  time  instances 
from  one  or  more  sensors  and  wish  to  determine  whether  any  of  the  tracks  belong  to  the  same  RSO.  This  is  the  track- 
to-track  association  (TTTA)  problem.  The  final  association  problem  is  where  we  are  given  a  set  of  observations  at 
different  time  instances  and  wish  to  determine  which  of  these  observations  were  generated  by  the  same  RSO.  This  is 
the  observation-to-observation  association  (OTOA)  problem,  which  this  paper  will  focus  on. 

Most  methods  for  observation-to-track  data  association  fall  into  one  of  two  broad  categories:  multiple  hypothesis 
tracking  (MHT)  and  joint  probabilistic  data  association  (JPDA).  MHT  has  been  studied  extensively  in  the  literature 
and  is  a  methodology  that  uses  some  or  all  reasonable  measurements  to  update  a  track  and  delay  the  decision  on  which 
observation  was  correctly  associated  to  a  track  until  all  measurements  have  been  processed.  Alternatively,  JPDA 
calculates  a  weighted  sum  of  reasonable  observations  in  order  to  create  and  update  a  track  [1].  These  weights  are 
the  probability  that  the  observation  originated  from  a  specific  object,  also  known  as  the  probability  of  detection  or 
appearance  probability.  While  conceptually  attractive,  both  MHT  and  JPDA  suffer  from  the  curse  of  dimensionality 
where  the  computational  burden  increases  exponentially  as  the  number  of  potential  objects  increases.  Modern  im- 
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plementations  of  near  optimal  JPDA  algorithms  and  multi-frame  adaptations  of  MHT  have  recently  been  introduced 
which  run  in  polynomial  time,  greatly  improving  the  computational  effectiveness  of  JPDA  based  approaches  [2-5]. 
For  an  extensive  literature  review  and  discussion  of  the  nuances  of  different  MHT  and  JPDA  techniques,  please  refer 
to  Ref.  [6].  For  an  extensive  discussion  of  JPDA  as  applied  to  classical  filtering  (JPDAF),  please  refer  to  Ref.  [7]. 

Many  data  association  techniques  rely  upon  an  assumption  of  Gaussianity  of  measurement  errors;  however,  this  is 
not  typical  of  the  space  surveillance  problem.  For  this  reason,  particle  filter-based  methods  are  attractive  for  data 
association  but  they  have  the  significant  disadvantage  of  being  computationally  expensive  due  to  the  fact  that  they  are 
based  upon  Monte  Carlo  sampling  approaches.  In  the  broader  multiobject  multitracking  community,  Markov  Chain 
Monte  Carlo  (MCMC)  methods  have  been  proposed  to  reduce  the  computational  load  [6]  but,  until  recently,  the  space 
surveillance  community  has  largely  shied  away  from  these  types  of  approaches  because  the  full  implications  of  proper 
statistical  sampling  are  yet  to  be  fully  understood  [8]. 

The  track-to-track  problem  can  be  thought  of  as  the  problem  where  one  track  is  propagated  forward  to  the  time  of 
the  subsequent  track  and  then  determining  an  efficient  gating  strategy  for  associating  the  two  tracks  [9].  The  classic 
space  object  cataloging  approach  has  used  fixed  rectangular  gates  in  Cartesian  position  and  velocity  space  leading  to 
the  generation  of  large  numbers  of  UCTs  as  the  space  object  population  has  grown  significantly  in  recent  decades. 
This  fixed  gating  approach  was  improved  upon  recently  by  Alfriend  and  then  Hill  with  the  introduction  of  covariance 
based  track  association  (CBTA)  [10, 11].  CBTA  has  been  shown  to  be  more  effective  than  classic  gating  techniques 
but  is  fundamentally  constrained  by  the  validity  and  realism  of  the  2nd  moment  of  the  measurement  errors.  In  many 
cases,  sensors  belonging  to  the  Space  Surveillance  Network  (SSN)  do  not  provide  individual  observation  or  track 
errors  requiring  the  analyst  to  construct  an  appropriate  representation  and  hope  that  it  is  adequate  to  perform  CBTA. 

This  paper  considers  the  final  problem  of  observation-to-observation  association.  This  problem  can  be  thought  of  as 
determining  the  statistical  dependence  of  observations  of  which  there  are  numerous  metrics  to  choose  from.  A  recent 
approach  combines  an  adaptive  Gaussian  sum  filter  with  the  Kullback-Leibler  (KL)  divergence  measure  for  effective 
data  association  [12].  However,  the  KL  divergence  does  not  satisfy  all  the  properties  of  a  true  distance  metric  making 
analysis  of  results  all  the  more  challenging  in  addition  to  the  fact  that  computing  the  KL  divergence  is  computationally 
demanding.  In  general,  we  want  the  chosen  statistical  dependence  metric  between  two  observations  to  identify  a  non¬ 
linear  higher-than-second  order  dependence  between  measurements  in  order  to  claim  a  pair-wise  association  between 
observations.  Mutual  information  (MI)  has  been  identified  in  the  literature  as  the  most  promising  metric  of  statistical 
dependence  for  OTOA  but  there  are  alternative  non-parametric  techniques  such  as  Kendall  tau.  Cross  Correlograms, 
and  Independent  Component  Analysis  that  could  also  be  considered  [13].  Ideally,  the  measure  of  statistical  dependence 
should  be  valid  without  any  assumptions  of  an  underlying  probability  density  function  and  should  be  extended  to  high 
dimensionality  of  input  measurements.  In  this  paper,  we  further  explore  both  information  divergence  (ID)  and  mutual 
independence  indexes  for  the  OTOA  problem. 

In  this  paper  we  demonstrate  the  promise  of  these  indexes  for  data  association  and  their  efficacy  in  solving  a  simple 
OTOA  problem  with  two  closely  spaced  RSOs.  In  Section  2.,  we  first  describe  the  overall  procedure  and  how  it 
relates  to  initial  orbit  determination  (IOD).  In  Section  3.,  we  describe  the  various  information  theoretic  indexes  one 
may  consider.  We  demonstrate  efficacy  of  the  methods  in  a  numerical  example  in  Section  4.  We  summarize  the  main 
results  of  the  paper  in  Section  5. 


2.  INITIAL  ORBIT  DETERMINATION  AND  THE  OTOA  PROBLEM 

The  core  idea  in  the  proposed  OTOA  approach  is  to  use  an  appropriate  initial  orbit  determination  (IOD)  method 
to  generate  an  uncertain  track  from  a  set  of  measurements  (that  we  wish  to  test  for  association)  and  their  known 
statistics  (see  Fig.  (1)).  One  can  then,  for  example,  compare  (in  some  information  theoretic  sense)  the  amount  of 
information  shared  between  the  generated  orbit  statistics  and  the  measured  output  statistics.  The  more  consistent 
the  estimated  track  is  with  the  measurements,  the  more  likely  the  observations  were  generated  by  the  same  physical 
phenomenon.  One  may  also  compare  the  degree  of  consistency  between  the  generated  orbit’s  output  and  the  measured 
output  statistics.  The  former  method  of  comparison  is  based  on  the  notion  of  mutual  information  between  the  IOD- 
based  orbit  statistics  and  the  measured  observation  statistics  (see  Fig.  (2)).  The  second  method  is  based  on  some  notion 
of  information  divergence  or  “distance”  between  the  measured  observation  statistics  and  the  IOD-based  reconstructed 
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observation  statistics  (see  Fig.  (3)).  Clearly,  the  performance  of  the  proposed  data  association  schemes  depends  on  the 
performance  of  the  underlying  IOD  method.  We  will  describe  in  detail  the  two  approaches  in  the  next  section. 


Generated  state 


Map  uncertainty  in 
measurements  to 
uncertainty  in  orbital  state 
at  time  t,  using  probabilistic 
Gauss  IOD 


( 


3  Angles-Only  Measurements  and  their 
statistics  in  a  total  of  6-dimensional  space 

Fig.  1.  The  general  probabilistic  IOD  approach  developed  by  Hussein  et  al.  [14]. 


For  illustration  purposes,  we  will  assume  that  the  observations  are  angles-only.  We  will  use  the  classical  Gauss  method 
approach  for  the  IOD  step  (see  Ref.  [15]).  In  general,  assume  we  are  given  a  set  of  n  observations  { Zj . . . . ,  zn}  taken 
at  observation  times  ti,. .. ,tn  (times  are  assumed  distinct,  without  any  loss  of  generality).  Of  these  we  will  choose  3 
observations  (as  required  by  the  Gauss  method)  to  test  whether  they  were  generated  by  the  same  RSO.  The  procedure 
is  applied  to  all  combinations  of  observations.  How  to  computationally  address  the  combinatorics  problem  is  beyond 
the  scope  of  this  paper  and  we  will  only  address  the  problem  of  determining  whether  a  set  of  three  observations 
were  generated  by  the  same  RSO.  But  in  the  general  n  observation  problem,  those  observations  that  were  deemed 
to  be  associated  will  now  form  tracks  and  the  remainder  of  this  problem  becomes  a  TTTA  problem  that  results  in 
“stringing”’  these  tracks  together  to  form  a  set  of  non-redundant  new  tracks.  We  do  not  address  the  TTTA  problem 
here. 

Given  three  distinct  measurements  z  =  (z1  ,z2,z 3)  taken  at  t\,t2,  and  f3,  the  Gauss  IOD  method  produces  a  candidate 
orbit  described  by  the  six-dimensional  state  x2  =  g  (z)  at  time  t2,  where  g  (•)  is  the  function  that  maps  a  set  of  three 
angles-only  measurements  to  orbital  space  coordinates.  The  state  may  be  specified  in  orbital  elements,  Cartesian 
coordinates,  etc.  Furthermore,  let  be  the  function  that  propagates  the  state  xr  defined  at  time  t,  to  the  state  Xj 
defined  at  time  tj.  Then  X\  =  /Vi  {x2)  is  the  (backward)  propagated  state  at  time  and  cc3  =  fx>,(x2)  is  the 
propagated  state  at  time  f3  given  the  state  x2  at  time  t2.  Finally,  let  h,  be  the  function  that  maps  the  state  at  time  t,  to 
an  observation  Zj  =  hi(xi). 

In  order  to  analyze  the  uncertainty  in  the  orbital  space  resulting  from  a  given  uncertainty  in  the  measurement  space, 
we  will  use  both  the  unscented  transform  (UT)  and  the  Monte  Carlo  (MC)  method  (see  Hussein  et  al.  [14]).  For  the 
UT  analysis,  the  measurement  process  is  assumed  Gaussian  and  the  generated  orbit  uncertainty  will,  therefore,  also 
be  Guassian.  Likewise,  the  measurement  statistics  in  the  MC  approach  will  be  assumed  Gaussian.  5000  particles  are 
generated  from  the  6-dimensional  measurement  uncertainty  probability  density  function  (pdf)  \>m  {z)  and  mapped  into 
orbital  space  coordinates  at  the  time  of  the  second  measurement  t2.  The  resulting  Gaussian  distribution  from  the  UT 
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statistics 

Fig.  2.  Mutual  information  can  be  used  as  an  index  of  how  much  dependence  exist  between  a 
set  of  measurements  and  the  orbit  they  generate  if  they  were  associated.  The  “more”  unasso¬ 
ciated  the  observations  are  the  smaller  the  mutual  information  will  be,  and  vice  versa. 


analysis  and  the  particle  cloud  from  the  MC  analysis  in  the  orbital  space  are  approximations  of  the  actual  uncertainty 
pdf  Pq  (x-j  )  of  the  IOD  solution.  In  the  UT  method,  there  are  a  total  of  13  =  2  x  6  +  1  sigma  points  {Z(j> }  since  the 
measurement  space  has  a  dimension  of  6. 


As  noted  by  Hussein  et  al.  [14],  there  is  a  subtle  difference  between  this  description  of  the  problem  compared  to 
other  approaches  which  have  appeared  previously  in  the  literature.  The  nonlinear  mapping  g  (•)  is  a  map  from  the 
entire  three-measurement  space  to  the  orbital  space,  and  is  not  a  map  of  an  individual  measurement  zt,  i  =  1,2,3, 
to  the  orbital  space.  Therefore,  samples  of  the  measurement  uncertainty  should  be  drawn  from  the  distribution  in 
the  six-dimensional  measurement  space  defined  with  the  global  measurement  variable  z  and  not  from  the  individual 
distributions  defined  on  the  individual  measurement  variables  Zi,  i  =  1,2,3.  For  the  UT  method,  in  particular,  this 
will  result  in  the  correct  number  (13)  of  sigma  points  being  generated  to  describe  the  uncertainty  distribution  in  six¬ 
dimensional  orbital  space. 


For  example,  an  MC  sample  and  UT  sigma  points  are  shown  in  Fig.  (4)  for  a  Gaussian  uncertainty  with  each  dimension 
independently  distributed  with  3cr  =  10  arcsec.  Each  colored  set  of  dots  represents  a  two  dimensional  projection  of 
the  MC  sample  onto  each  individual  measurement  plane  (i.e.,  there  are  5000  x  3  =  15  000  dots  shown,  but  the  MC 
sample  only  contains  5000  particles).  The  remaining  dimensions  of  the  sigma  points  overlap  each  other  since  the 
uncertainty  is  the  same  in  each  direction.  The  histograms  show  the  marginal  distributions  of  the  MC  azimuth  and 
elevation  uncertainty  samples  for  one  of  the  measurement  times. 
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Fig.  3.  Information  divergence  can  be  used  to  measure  how  different  the  measurement  statis¬ 
tics  are  from  that  of  the  reconstructed  measurement  statistics  under  the  hypothesis  that  the 
observations  are  associated. 


3.  INFORMATION  THEORETIC  CRITERIA  FOR  OTOA 
3.1  Information  Divergence 


Assuming  that  the  measurements  are  statistically  independent,  let  pm(zi,  z2,  z3)  =  plM{zf)p2M{zf)p\I{z3)  be  the 
joint  pdf  in  the  three  measurements.  With  abuse  of  notation,  \etp20(x2)  =  9(pm(zi,  Z2, 23))  be  the  orbital  IOD-based 
reconstructed  pdf  at  t2,  and  Pq(x  1)  =  f2i(Po(xz))  and  Pofas)  =  f23(Poix2))  be  the  propagated  pdfs  at  times  t\ 
and  £3,  respectively.  Finally,  let  plM{zi)  =  h(pQ(xi))  be  the  uncertainty  in  the  measurement  as  mapped  from  the 
uncertainty  in  the  state  at  time  £;.  If 

1.  all  three  observations  were  generated  by  the  same  RSO, 

2.  the  IOD  method  perfectly  produces  the  exact  orbit  that  generated  the  measurements,  and 

3.  the  orbital  propagation  and  measurement  models  perfectly  match  the  true  orbital  motion  and  measurement  pro¬ 
cesses 

then  plM(zi )  should  be  equal  to  p'M ( z, ) .  In  this  paper  we  will  assume  that  the  third  condition  holds.  However,  the 
second  condition  is  guaranteed  not  to  be  true  since  the  Gauss  IOD  method  is  not  exact.  Therefore,  any  mismatch 
between  plM  and  plM  would  be  the  result  of  the  IOD  method’s  errors  or  if  the  hypothesis  that  the  three  observations 
were  generated  by  the  same  RSO  does  not  hold.  The  key  insight  is  that  among  the  different  associations  between  a 
set  of  measurements  {z\, . . .  ,  znj  the  ones  that  result  in  the  least  amount  of  discrepancy  betw’een  plM  and  plM,  above 
and  beyond  the  discrepancy  introduced  by  the  IOD  method  that  is  shared  among  all  associations*,  are  the  more  likely 
ones  to  be  true  associations.  The  key  question  is:  How  can  one  measure  the  discrepancy  between  two  pdfs? 

*This  assumes  that  the  performance  of  the  IOD  method  is  independent  of  the  measurements  and  their  statistics.  To  the  best  of  our  knowledge, 
such  a  dependence  has  not  been  studied  in  the  literature. 


DISTRIBUTION  STATEMENT  A.  Approved  for  public  release.  Distribution  is  unlimited. 


-15  -10  -5  0  5  10 

Right  Ascension  Error,  arcsec 

Fig.  4.  Example  of  sampling  a  six-dimensional  Gaussian  measurement  uncertainty. 


One  such  method  is  the  use  of  information  divergence  [16].  While  there  are  many  definitions  of  information  divergence 
[17, 18],  we  will  use  the  Kullback-Leibler  (KL)  divergence  [16].  The  KL  divergence  between  two  pdfs  p(x)  and  q(x) 
is  given  by: 


DKL(p(x)\\q(x)) 


(1) 


If  both  p  and  q  are  Gaussian,  then  one  can  compute  DKL(p\\q)  in  closed  form  and  is  given  by  (see  for  example. 
Ref.  [12]) 


DKL(p{x)\\q(x)) 


d  +  (mp  -  Mg)  •  sg  1  '  (Mp  -  Mg)  , 


(2) 


where  fip  and  (resp.,  ptq  and  Sg)  are  the  mean  and  covariance  of  the  pdf  p  (resp.,  q),  and  where  d  is  the  dimension 
of  the  underlying  space.  In  the  case  where  p  and  q  are  Gaussian  mixtures,  there  is  no  known  closed-form  expression 
for  the  KL  divergence.  However,  other  definitions  of  divergence  can  be  used  to  obtain  a  closed-form  expression.  One 
such  divergence  is  the  Cauchy-Schwarz  divergence  [19,20]. 


3.1.1  Unscented  Transform:  Following  the  UT  procedure  described  in  Ref.  [14],  one  starts  by  assuming  Gaus¬ 
sian  statistics  for  the  observations:  plM(zi)  =  pg(zi\ pz ,  £Zi),  where  pg(r;  fir,Hr)  is  a  Gaussian  distribution 
in  r  with  mean  pr  and  covariance  Sr.  Using  the  Gauss  IOD  method,  one  can  obtain  a  Gaussian  approxima¬ 
tion  Pq(x2)  =  pg{x 2;  m:,.2 ■  SX2)  of  the  pdf  Pq(x2).  As  one  would  normally  do  in  an  Unscented  Kalman  Filter 
(UKF),  this  pdf  can  be  propagated  backward  (respectively,  forward)  in  time  to  obtain  the  Gaussian  pdf  approximation 
Po(x  1)  =  pg(x  1;  fiXi ,  SXl)  (respectively,  Pq{x3 )  =  pg(x 3;  fiX3,  SX3)).  From  these  and  using  the  assumed-known 
measurement  model,  one  can  generate  the  pdf  for  the  measurements  at  these  three  times  based  on  the  IOD-based  orbit 
under  the  assumption  that  the  observations  are  associated:  plM  (^Zi\filZi,T!iz^j  =  h(pl0(xi)),  i  =  1,2,3.  We  may 

now  compute  the  information  divergence  DlKL{pLM{zi)\\pLM{z))  between  plM  and  p'M  for  each  of  the  three  times  f,:, 
i  =  1,2,3.  The  overall  criterion  for  assessing  association  would  then  be 

3 

^kl  ='^JDlKL{pLM{zi)\\p1’M{z)).  (3) 

i=i 

The  association  that  maximizes  this  index  among  all  possible  combinations  of  three  angles-only  measurements  is 
one  way  to  assess  association.  We  call  this  solution  to  the  OTOA  problem  the  UT-ID  solution  to  be  referenced  in  the 
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numerical  example  section  later  in  the  paper.  This  index  is  similar  in  spirit  to  the  divergence-based  (OTTA)  association 
discussed  in  Ref.  [12]. 

3.1.2  Particle  Based  Methods:  The  underlying  assumption  of  Gaussianity  in  the  UT  approach,  while  analytically 
appealing,  is  not  necessarily  a  faithful  representation  of  the  uncertainty  in  the  reconstructed  state  from  the  IOD  method. 
A  more  accurate  representation  would  be  to  represent  uncertainty  using  a  Monte  Carlo  (MC)  particle  cloud.  However, 
computing  divergence  based  on  a  particle  cloud  poses  analytical  challenges.  To  see  this,  consider  a  particle  cloud 
{Z®}  with  weights  drawn  from pM (zi ,  z2 ,  z3)  =  pi (z1 ;  /xZl ,  £Zl )p2g (z2 ;  )P3g (z3 ;  ^Z3 ,  SZ3 ) .  This 

cloud  can  be  mapped  into  the  state  space  using  the  Gauss  IOD  method  and  that,  in  return,  mapped  to  the  measurement 
space  using  the  observation  function  h(-).  Denote  the  resulting  cloud  by  { Z-1'1 }  with  weights  j  wf !  j .  If  the  IOD 

method  was  exact  and  the  motion  and  measurement  models  were  known  perfectly,  the  particle  set  {Z^}  should  be 
identical  to  the  set  {Z^)  }  and  the  KL  divergence  can  be  shown  to  be: 


-  EPi,(zi) 


DKL{plM{Zi)\\PM{z)i)  =  I  log 

log 

-XI*'  lo§ 

i 

=  wl  log 

i 

=  wl  log 


Pm&) 

P\t(zi) 
PM(zi ) 


dz. 


PM(zi)J. 

Pm(Z{z))\ 

PhiZ^) ) 

’Zkwkszw  (-2(i)) 


=  Sr  =°. 


which  is  what  we  would  expect  under  perfect  knowledge  of  the  system  model  and  with  a  perfect  IOD  method.  How¬ 
ever,  because  of  errors  in  the  IOD  solution,  the  two  sets  {Z^l }  and  { Z1'1 }  will  not  match  and  the  delta  functions  in  the 
denominator  in  the  previous  equation  will  evaluate  to  zero  for  all  i  and  we  will  get  I) k t.  oc  log(l/0)  =  log  oo  =  oo. 
Even  if  one  uses  other  definitions  of  divergence,  the  value  of  the  divergence  would  not  correspond  to  the  actual  di¬ 
vergence  between  plM(zi )  and  plM(z).  Note,  however,  that  if  Pm  A  analytically  known,  can  be  sampled  and  can  be 
evaluated,  then  information  divergence  can  be  computed  when  plM  is  represented  by  a  cloud  of  particles.  We  will 
demonstrate  this  in  a  future  publication.  We  point  out,  however,  that  in  the  next  subsection  we  will  be  able  to  com¬ 
pute  the  mutual  information  between  two  particle  clouds  and  not  run  into  the  issue  mentioned  above  for  information 
divergence. 

An  alternative  approach  to  computing  the  divergence  directly  using  the  particle  cloud  is  to  approximate  the  particle 
cloud  using  some  analytic  pdf  for  which  we  can  compute  the  divergence.  One  such  approach  would  be  to  use  the 
expectation  maximization  algorithm  to  convert  the  particle  cloud  into  a  Gaussian  Mixture  Model  [21].  We  will  not 
describe  the  details  of  the  method  here.  Instead  we  refer  the  reader  to  an  EM  algorithm  described  in  Ref.  [22],  which 
is  a  robust  and  versatile  EM  algorithm,  called  the  FJ-EM  algorithm.  The  method  allows  for  the  user  to  specify  a 
maximum  number  of  GMM  components  and  it  selects  the  number  of  components  that  best  represent  the  particle 
cloud.  The  user  is  also  not  required  to  choose  a  good  initial  guess  of  the  components.  In  other  words,  the  method  does 
not  require  a  careful  initialization  of  the  algorithm  (other  EM  algorithms  do  require  a  very  careful  initialization.) 

As  an  illustration,  consider  the  2000  particle  cloud  generated  from  the  four  component  planar  GMM  with  weights: 

vh  =  0.25,  i  =  1,2, 3, 4, 
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means: 


and  covariances: 


AXi  = 

[10.0 

0.0]T 

7*2 

[-10. 

0  0.0]T 

/' 

[0.0  10.0]T 

/'  :  = 

[0.0 

-  10.0]r 

2.0 

0.0  ' 

A  _ 

0.0 

2.0 

?  ^  — 

The  generated  particle  cloud  was  then  fed  into  the  EM  algorithm  and  the  following  5-component  GMM  initial  guess 
was  used  for  initializing  the  algorithm: 

wl  =  0.1 
w%  =  0.4 
w%  =  0.2 

wl  =  0.2 
w%  =  0.1 


/x?  =  [1.0  0.0]T 
ft?  =  [—1.0  0.0]T 
At?  =  [0.0  1.0] T 
At?  =  [-1.0  -  1.0]T 

^5  =  [—5.0  —  5.0] T 


The  algorithm  resulted  in  the  following  set  of  GMM  weights: 


w{  =  0.25 
w[  =  0.24849 
w{  =  0.26156 
w{  =  0.23995 

U>5  =  0.0 


where  we  notice  that  one  component  has  been  eliminated.  The  final  set  of  means  were  found  to  be 

At{  =  [9.96428  0.02116]T 
At£  =  [-9.87450  -  0. 02471] T 
=  [0.03449  10.06634] T 
At{  =  [0.04656  -  9. 96678] T 
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and  the  final  set  of  covariances  were  found  to  be 


1.92951 
0.02156 

2.10782 

-0.04989 


= 

= 

= 


1.89439 

0.05116 

1.99178 

0.01623 


0.02156 

1.92172 

-0.04989 

1.92587 

0.05116 

2.35834 

0.01623 

1.85653 


The  resulting  GMM  is  graphically  shown  in  Fig.  5  against  the  particles.  Note  that  a  component  has  a  weight  of  zero 
(i.e.,  eliminated)  resulting  in  an  effective  number  of  components  of  4.  While  the  true  values  of  the  weights,  means  and 
covariances  were  not  recovered  exactly,  the  error  in  these  parameters  is  quite  small. 
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Fig.  5.  An  example  showing  the  original  means  (equally  weighted)  that  generated  the  shown 
particle  cloud.  The  cloud  was  then  fed  into  the  FJ-EM  algorithm  that  generated  the  GMM 
approximation  shown.  The  FJ-EM  algorithm  was  able  to  very  accurately  reconstruct  the  true 
GMM  that  generated  the  particle  cloud. 


To  summarize  starting  with  the  two  particle  clouds  {,23*)}  and  {23*)},  one  can  convert  each  into  a  GMM  to  ap¬ 
proximate  pm(zi)  and  pM(zi),  i  =  1,  2, 3.  There  is,  however,  no  known  closed-form  expression  for  KL  divergence 
between  two  GMMs.  The  Cauchy-Schwarz  (CS)  divergence,  however,  can  be  computed  for  GMMs  [19,20].  The 
general  expression  for  CS  divergence,  Kqs ,  between  two  pdfs  p(x)  and  q(x)  is  given  by 


DCs(p{x)\\q{x) 


log 


(/ p2(x)dx )  (/  q2{x)dx) 


(f  P(x)q(x)da 


Our  ability  to  compute  the  CS  divergence  for  GMMs  in  closed-form  lies  in  the  fact  that  ifp(x)  =  wtPg(xj  Mp?  Ep) 
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and  q(x)  =  wlpg{x]  p * ,  S^)  one  can  show  that 


This  expression  can  be  used  to  compute  f  p2(x)dx  and  f  q2(x)dx,  and  hence  Dcs{p{x) \\q(x))  when  both  p  and  q 
are  GMMs. 

Thus,  the  two  particle  clouds  {Zll> }  and  are  first  converted  to  GMMs  using  the  FJ-EM  algorithm,  which 

are  then  subsequently  used  to  compute  the  approximate  CS  divergence  between  plM(zh  and  p'M(z,).  The  overall 
divergence-based  index  for  the  MC  case  is  the  sum  of  the  three  divergences: 

3 

Dbots  ~  -Pcs(Pm(2:»)IIpm(z»))-  (4) 

i= 1 

We  call  this  solution  to  the  OTOA  problem  the  MC-ID  solution  to  be  referenced  in  the  numerical  example  section  later 
in  the  paper. 

3.2  Mutual  Information 

The  mutual  information,  /(at,  2),  between  two  random  variables  x  and  z  is  a  measure  of  the  degree  of  dependence 
between  these  two  variables.  Formally,  it  is  given  by 

1{x • z)  =  /  / p(x- z)  log  ) ixiz ■  (5) 

where  p(x,  z)  is  the  joint  distribution  of  x  and  z  and  where  p(x),  respectively  p(z),  is  the  marginalization  of  p(x,  z) 
with  respect  to  z,  respectively  x.  Two  important  properties  are  to  be  noted  here.  Firstly,  mutual  information  is  a 
symmetric  function  of  x  and  2.  Secondly,  if  x  and  2  are  independent  then  p(x,  z)  =  p(x)p(z)  and  the  mutual 
information  is  zero. 

The  mutual  information  index  we  propose  to  use  is  as  follows.  First,  consider  the  mutual  information  h(xi,Zi) 
between  the  state  Xi  and  the  measured  output  2,  at  time  t%.  The  overall  mutual  information  index  would  then  be 
hot  =  h  +1-2  + 1.1,-  Other  indexes  based  on  mutual  information  can  also  be  considered.  For  example,  one  can  consider 
the  mutual  information  between  the  joint  orbital  state  variable  atj0int  =  (x\  ,x2,  x:i)  and  the  joint  measurement 
variable  2j0int  =  (zi,  22, 23).  Such  indexes  will  be  the  subject  of  future  research. 

It  can  be  shown  that  the  mutual  information  It(xi,  z{)  can  be  expressed  in  terms  of  the  KL  divergence: 


h(xi,Zi)  =  DKL{p(xilZi)\\p{xi)p(zi)).  (6) 

Therefore,  if  the  joint  pdf  p(xi,  zj)  is  Gaussian,  one  can  compute  Ii  in  closed-form.  However,  for  the  GMM  case,  no 
closed-form  solution  exists  because  the  mutual  information  is  related  to  the  KL  divergence  and  not  a  divergence  for 
which  a  GMM  has  a  closed-form  solution. 


3.2. 1  Unscented  Transform:  Following  the  UT  procedure  described  in  Ref.  [  14],  one  can  obtain  a  Gaussian  approx¬ 
imation  of  the  pdf  of  the  state  x2.  One  can  then  use  the  UKF  to  obtain  the  joint  distribution  p{xil  2, )  after  propagating 
and  updating  with  the  corresponding  measurement  2,;,  1  =  1,2.  3.  It  can  be  shown  that  this  distribution  is  Gaussian 
with  the  following  mean: 


and  covariance: 


DISTRIBUTION  STATEMENT  A.  Approved  for  public  release.  Distribution  is  unlimited. 


where  fix  is  the  UKF  updated  state  mean  at  time  ti,  pz  is  the  measurement  mean  at  time  U,  is  the  covariance 
in  the  state  at  time  ti,  is  the  measurement  covariance  at  time  t,,  and  is  the  cross-covariance  between  the 

state  and  the  measurement  at  time  i,;. 


Since  the  joint  pdf  is  Gaussian,  the  marginalization  with  respect  to  x,  and  zt  are  both  Gaussian  and  have  means 
and  pz  . ,  respectively,  and  covariances  and  £z.,  respectively.  Now  that  we  have  Gaussian  approximations  of 
p(xi,Zi )  and  p(xi)p(zi),  one  can  use  Eq.  (6)  and  Eq.  (2)  to  compute  the  mutual  information  Il{xi,Zi).  We  call  this 
solution  to  the  OTOA  problem  the  UT-MI  solution  to  be  referenced  in  the  numerical  example  section  later  in  the  paper. 
This  index  is  similar  in  spirit  to  the  mutual  information-based  (OTTA)  association  discussed  in  Ref.  [12]. 


3.2.2  Particle  Based  Methods:  As  mentioned  previously,  converting  particle  clouds  to  GMMs  to  compute  MI  is  not 
feasible  since  MI  is  defined  in  terms  the  KL  divergence  Eq.  (6)  for  which  we  do  not  have  a  closed-form  GMM-based 
expression.  Rather  surprisingly,  however,  we  can  actually  use  the  particle  cloud  to  compute  a  faithful  evaluation  of 
the  mutual  information  index.  This  solution  is  motivated  by  the  MC  approach  in  Ref.  [23].  While  this  solution  is 
faithful  to  the  particle  nature  of  the  densities,  it  is  computationally  expensive.  At  the  time  of  publication  we  were 
in  the  process  of  parallelizing  the  particle-based  solution  on  a  graphics  processing  unit  (GPU).  However,  results  are 
not  readily  available  at  the  present  time.  Instead  we  will  describe  the  mathematics  behind  the  solution  and  provide  a 
non-SSA  example  to  verify  that  the  particle-based  expression  for  mutual  information  is  correct. 

Again,  we  begin  with  a  particle  cloud  {Z^}  with  weights  j  for  the  joint  distribution  Pm(zi,  Z2,  Z3)  = 

Pm(zi)p‘m(z2)Pm(z3).  This  cloud  can  then  be  fed  through  the  Gauss  IOD  method  to  produce  a  particle  cloud 
and  weights  j vMl  |  representing  the  uncertainty  in  the  state  at  time  t%.  This  cloud  can  then  be  propagated 

backward  and  forward  in  time  to  obtain  the  state  particle  clouds  and  (with  weights  |  Wx}  |  and 

| j,  respectively)  at  times  t\  and  t%,  respectively. 

We  begin  by  following  the  same  procedure  as  in  Ref.  [23]  by  noting  that 


Ii{xi,Zi )  =  H(xi )  -  H(xi\zi) 


where  H  (xi)  is  the  entropy  of  x.L  and  H  )  is  the  conditional  entropy  of  x,  given  z,  .  Notice  that  given  a  particle- 
based  distribution  plM(zi)  =  1  wi^zU)  (zi),  we  can  obtain  an  expression  for  pl0(xi)  by  marginalization  and 

using  the  basic  MC  integration  approximation  we  obtain: 
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Therefore,  the  entropy  H{xi)  is  given  by: 


H(xi) 


Po(xi)logpl0(xi)dxi 


N* 


'  N- 


J2wlP(Xi\Zi)  log  ^2WzP(Xi\Zi)  Axi 


vi— i 


u= 1 


'  N, 


Ew*  /  p{xi\Z\3’)  log  'Y^lwlzp(xi\Z\l))  da: 


o= i 
iv* 


i  =  l 


j= i 
JV*  JV® 


’p(*dzi3)) 


'  at* 


log 


J  =  1 


'AT* 


E<  E < los  E ™l*p(xr\ zn  , 

j= i  k=i  Vt=i  / 


(7) 

(8) 

(9) 

(10) 

(11) 


where  { X k  =  1, . . . ,  iVX4,  are  (assumed  to  be  the  same  for  all  j )  particles  drawn  ffomp(cCi|.Z(J9).  Account¬ 

ing  for  possible  errors  in  the  IOD  solution  and  dynamic  propagation  and  modeling  them  as  additive  noise,  we  have 
xi  =  f2i{g(zi,  Z‘2 -  Z3))  +  v,  where  v  is  an  additive  noise  term  with  some  density  pv(v).  In  this  case,  the  particles 
X.fO  are  drawn  such  that  X^3  =  f2i(g(Z^))  +  where  ~  pv(v)  and  where  Z^  =  (z[3\  Z^  \  Z^). 
However,  an  open  area  of  research  is  the  characterization  of  the  error  introduced  in  the  IOD  method.  If  one  ignores 
these  errors  and  assume  that  the  state  and  the  measurements  are  deterministically  related,  then  we  can  simply  propagate 
the  particles  Z^  through  the  various  transformations  without  sampling.  However,  this  will  result  in  NXi  identical 
particles  \X^J  }  which  can  then  be  simply  replaced  by  a  single  particle:  { X-  ^ }  =  f2i(g(Z ^1)). 


Following  a  similar  procedure  as  above  one  can  show  that 

Nz  N*i 

H{Xi\Zi)  =  -E<E<los(p(x"biza)))  <12) 

3= 1  k= 1 

One  can  now  use  Eq.  (7)  and  Eq.  (12)  to  compute  the  mutual  information  U{Xi,  Zi).  The  overall  index,  as  above, 
would  then  be  Ttot  —  I\  +  I2  +  Is- 


As  a  way  to  verify  that  the  above  equations,  we  consider  a  simple  non-SSA  problem  for  which  we  have  a  closed- 
form  solution  to  compare  against.  Let  some  variable  x  be  related  to  another  variable  z  via  the  linear  relationship: 
x  =  Hz  +  v,  where  z  ~  ps(/rz,  Sz)  and  v  ~  pg(fivl  S„).  One  can  use  Eq.  (2)  and  Eq.  (6).  It  was  found  that  the 
exact  mutual  information  is  given  by  5.5254529.  Fig.  (6)  shows  the  percentage  error  of  the  MC -based  computation 
from  the  true  MI  value.  While  performing  this  analysis,  it  was  found  that  the  accuracy  of  the  method  depended  heavily 
on  the  number  of  z  particles  and  was  very  insensitive  to  the  number  of  x  sub-samples.  In  the  figure  we  show  the 
percentage  error  for  a  fixed  number  of  x  samples  Nx  =  100.  As  can  be  seen  the  error  is  converging  to  zero  with  a 
larger  number  of  samples. 


4.  SIMULATION  RESULTS 

In  this  simulation  we  compare  the  performance  of  the  three  OTOA  solutions:  UT-ID,  MC-ID,  and  UT-MI.  The  MC-MI 
solution  described  at  the  end  of  last  section  will  be  tested  in  a  future  publication.  For  testing,  we  consider  two  objects, 
with  identification  numbers  0  and  1,  in  close  proximity  of  each  other.  They  both  have  the  same  orbital  elements  shown 
in  Table  1  except  for  the  value  of  true  anomaly  at  the  initial  time.  A  set  of  two  observations  from  the  two  RSos  are 
collected  at  three  different  times.  If  we  arbitrarily  index  the  two  measurements  with  0  and  1,  then  the  question  is 
which  sequence  of  tags  are  the  correct  ones?  There  are  eight  possible  combinations  of  tags:  000  (i.e.,  observations 
with  tags  0  at  the  three  time  instances  are  associated  to  one  of  the  two  objects  and  so  on),  001,  010,  100,  011,  101,  1 10 
and  111.  The  observations  were  tagged  such  that  the  two  correct  ones  are  000  (all  coming  from  RSO  number  0)  and 
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Fig.  6.  Percentage  error  from  true  value  vs.  number  of  z  samples  in  the  MC-based  MI 
computation  for  the  linear  Gaussian  problem. 


Ill  (all  coming  from  RSO  number  1).  The  criterion  to  evaluate  the  three  OTOA  methods  will  be  how  close  the  two 
objects  are  (in  true  anomaly)  before  the  method  being  tested  fails  to  return  the  two  correct  associations  as  the  two  most 
likely  ones.  Note  that  when  a  method  “fails”,  while  the  correct  associations  may  not  be  the  one  most  highly  ranked, 
they  would  rank  very  close  to  the  top  and  as  the  separation  distance  between  the  two  objects  decreases  further,  the 
correct  association  are  further  from  being  top  ranked  and  are  more  or  less  arbitrarily  ranked  as  all  solutions  become 
indistinguishable.  The  sensor  is  assumed  to  be  the  Socorro,  NM,  ground  sensor.  Topocentric  azimuth  and  elevation 
observations  were  collected  at  the  rate  of  one  observation  every  20  minutes.  The  measurement  noise  is  assumed  to  be 
Gaussian  with  an  angular  standard  deviation  of  2  arcsec  for  both  azimuth  and  elevation. 


Table  1.  Orbital  Elements  of  the  True  Orbit 


Orbital  Element 

Value 

Semi-major  Axis  (km) 

26  600. 

Eccentricity 

0.2 

Inclination  (deg) 

55.0 

Perigee  (deg) 

-120.0 

Right  Ascension  of  the  Ascending  Node  (deg) 

-13.24 

For  each  of  the  methods,  the  separation  in  angular  anomaly  between  the  two  RSOs  is  steadily  decreased  until  the 
method  fails  to  report  the  correct  associations  as  the  most  likely  ones.  The  association  picked  by  each  method  is  the 
one  that  produces  the  maximum  value  of  MI  index  or  minimum  ID  index.  As  can  be  seen  in  Table  2,  the  UT-MI 
index  is  the  one  that  performs  the  best  among  the  ones  considered  in  this  paper.  It  has  a  resolution  of  0.30997  degrees 
separation  in  true  anomaly.  It  is  important  to  note  that  while  this  number  is  relatively  large  (~  1000  arcsec)  compared 
to  typical  sensor  resolutions,  one  has  to  remember  that  the  OTOA  problem  being  solved  is  far  more  challenging  that 
the  classical  OTTA  problem.  With  the  methods  proposed  in  this  paper,  and  others  to  be  considered  in  future  work, 
one  can  sift  through  a  collection  of  observations,  not  correlated  to  any  object  or  uncorrelated  track,  and  sort  out  which 
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ones  belong  to  each  other  under  minimal  amount  of  knowledge  of  the  observation  statistics. 


Table  2.  True  Anomaly  Difference  at  which  Association  Method  Fails 


Method 

True  Anomaly  Difference  (deg) 

UT-ID 

0.64458 

MC-ID 

0.68182 

UT-MI 

0.30997 

5.  CONCLUSION 

In  this  paper  we  considered  four  different  information  theoretic  approaches  to  the  observation-to-observation  associ¬ 
ation  problem.  Three  of  these  methods  (UT-ID,  MC-ID  and  UT-MI)  have  been  implemented  and  the  fourth  (MC-ID) 
has  been  demonstrated  to  work  on  a  non-SSA  problem  and  is  currently  being  developed  for  implementation  in  a  GPU 
environment.  It  was  shown  that  the  mutual  information  method  is  the  one  that  performed  the  best  among  the  methods 
tested.  Variations  of  these  techniques  exist  and  may  prove  to  be  even  more  accurate  than  the  ones  tested  in  this  pa¬ 
per.  It  is  generally  conjectured  that  MC-based  methods,  on  the  long  run,  will  prove  to  be  more  efficient  as  they  more 
faithfully  represent  the  underlying  distributions.  We  further  conjecture  that  the  use  of  mutual  information  will  be  more 
accurate  than  divergence-based  approaches  as  they  more  accurately  capture  degree  of  dependence  between  variables. 
This  paper  barely  touches  the  tip  of  the  iceberg,  and  its  most  basic  goal  is  to  induce  the  space  community  to  further 
consider  the  use  of  information  theoretic  measures  for  solving  the  various  data  association  problems  (OTOA,  OTTA, 
and  TTTA). 
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